Search CORE

1,358 research outputs found

Diffusion of Context and Credit Information in Markovian Models

Author: Bengio Y.
Frasconi P.
Publication venue
Publication date: 01/01/1995
Field of study

This paper studies the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data. This phenomenon hurts the forward propagation of long-term context information, as well as learning a hidden state representation to represent long-term context, which depends on propagating credit information backwards in time. Using results from Markov chain theory, we show that this problem of diffusion of context and credit is reduced when the transition probabilities approach 0 or 1, i.e., the transition probability matrices are sparse and the model essentially deterministic. The results found in this paper apply to learning approaches based on continuous optimization, such as gradient descent and the Baum-Welch algorithm.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

SNE: Signed Network Embedding

Author: F Heider
J Duchi
L Wu
LVD Maaten
R Collobert
Y Bengio
Y Bengio
Y LeCun
Y Yang
Publication venue
Publication date: 01/01/2017
Field of study

Several network embedding models have been developed for unsigned networks. However, these models based on skip-gram cannot be applied to signed networks because they can only deal with one type of link. In this paper, we present our signed network embedding model called SNE. Our SNE adopts the log-bilinear model, uses node representations of all nodes along a given path, and further incorporates two signed-type vectors to capture the positive or negative relationship of each edge along the path. We conduct two experiments, node classification and link prediction, on both directed and undirected signed networks and compare with four baselines including a matrix factorization method and three state-of-the-art unsigned network embedding models. The experimental results demonstrate the effectiveness of our signed network embedding.Comment: To appear in PAKDD 201

arXiv.org e-Print Archive

ScholarWorks@UARK

Crossref

UARK (University of Arkansas )

Part Detector Discovery in Deep Convolutional Neural Networks

Author: A Borji
C Cortes
J Liu
MD Zeiler
N Zhang
PF Felzenszwalb
Y Bengio
Y Bengio
Publication venue
Publication date: 14/11/2014
Field of study

Current fine-grained classification approaches often rely on a robust localization of object parts to extract localized feature representations suitable for discrimination. However, part localization is a challenging task due to the large variation of appearance and pose. In this paper, we show how pre-trained convolutional neural networks can be used for robust and efficient object part discovery and localization without the necessity to actually train the network on the current dataset. Our approach called "part detector discovery" (PDD) is based on analyzing the gradient maps of the network outputs and finding activation centers spatially related to annotated semantic parts or bounding boxes. This allows us not just to obtain excellent performance on the CUB200-2011 dataset, but in contrast to previous approaches also to perform detection and bird classification jointly without requiring a given bounding box annotation during testing and ground-truth parts during training. The code is available at http://www.inf-cv.uni-jena.de/part_discovery and https://github.com/cvjena/PartDetectorDisovery.Comment: Accepted for publication on Asian Conference on Computer Vision (ACCV) 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

On the Equivalence Between Deep NADE and Generative Stochastic Networks

Author: G.E. Hinton
G.E. Hinton
R.M. Neal
Y. Bengio
Y. LeCun
Publication venue
Publication date: 01/01/2014
Field of study

Neural Autoregressive Distribution Estimators (NADEs) have recently been shown as successful alternatives for modeling high dimensional multimodal distributions. One issue associated with NADEs is that they rely on a particular order of factorization for

P(\mathbf{x})

. This issue has been recently addressed by a variant of NADE called Orderless NADEs and its deeper version, Deep Orderless NADE. Orderless NADEs are trained based on a criterion that stochastically maximizes

P(\mathbf{x})

with all possible orders of factorizations. Unfortunately, ancestral sampling from deep NADE is very expensive, corresponding to running through a neural net separately predicting each of the visible variables given some others. This work makes a connection between this criterion and the training criterion for Generative Stochastic Networks (GSNs). It shows that training NADEs in this way also trains a GSN, which defines a Markov chain associated with the NADE model. Based on this connection, we show an alternative way to sample from a trained Orderless NADE that allows to trade-off computing time and quality of the samples: a 3 to 10-fold speedup (taking into account the waste due to correlations between consecutive samples of the chain) can be obtained without noticeably reducing the quality of the samples. This is achieved using a novel sampling procedure for GSNs called annealed GSN sampling, similar to tempering methods that combines fast mixing (obtained thanks to steps at high noise levels) with accurate samples (obtained thanks to steps at low noise levels).Comment: ECML/PKDD 201

arXiv.org e-Print Archive

Crossref

Generative Models For Deep Learning with Very Scarce Data

Author: CM Bishop
G Hinton
N Srivastava
Y Bengio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/03/2019
Field of study

The goal of this paper is to deal with a data scarcity scenario where deep learning techniques use to fail. We compare the use of two well established techniques, Restricted Boltzmann Machines and Variational Auto-encoders, as generative models in order to increase the training set in a classification framework. Essentially, we rely on Markov Chain Monte Carlo (MCMC) algorithms for generating new samples. We show that generalization can be improved comparing this methodology to other state-of-the-art techniques, e.g. semi-supervised learning with ladder networks. Furthermore, we show that RBM is better than VAE generating new samples for training a classifier with good generalization capabilities

arXiv.org e-Print Archive

Crossref

Semantic Wide and Deep Learning for Detecting Crisis-Information Categories on Social Media

Author: F Atefeh
H Gao
P Meier
TJ Campanella
Y Bengio
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

When crises hit, many flog to social media to share or consume information related to the event. Social media posts during crises tend to provide valuable reports on affected people, donation offers, help requests, advice provision, etc. Automatically identifying the category of information (e.g., reports on affected individuals, donations and volunteers) contained in these posts is vital for their efficient handling and consumption by effected communities and concerned organisations. In this paper, we introduce Sem-CNN; a wide and deep Convolutional Neural Network (CNN) model designed for identifying the category of information contained in crisis-related social media content. Unlike previous models, which mainly rely on the lexical representations of words in the text, the proposed model integrates an additional layer of semantics that represents the named entities in the text, into a wide and deep CNN network. Results show that the Sem-CNN model consistently outperforms the baselines which consist of statistical and non-semantic deep learning models

Crossref

Open Research Online (The Open University)

Label-Dependencies Aware Recurrent Neural Networks

Author: JL Elman
M Dinarelli
M Schuster
MI Jordan
MP Marcus
N Srivastava
P Werbos
R Collobert
R Mori De
S Hochreiter
Y Bengio
Y Bengio
Publication venue
Publication date: 06/06/2017
Field of study

In the last few years, Recurrent Neural Networks (RNNs) have proved effective on several NLP tasks. Despite such great success, their ability to model \emph{sequence labeling} is still limited. This lead research toward solutions where RNNs are combined with models which already proved effective in this domain, such as CRFs. In this work we propose a solution far simpler but very effective: an evolution of the simple Jordan RNN, where labels are re-injected as input into the network, and converted into embeddings, in the same way as words. We compare this RNN variant to all the other RNN models, Elman and Jordan RNN, LSTM and GRU, on two well-known tasks of Spoken Language Understanding (SLU). Thanks to label embeddings and their combination at the hidden layer, the proposed variant, which uses more parameters than Elman and Jordan RNNs, but far fewer than LSTM and GRU, is more effective than other RNNs, but also outperforms sophisticated CRF models.Comment: 22 pages, 3 figures. Accepted at CICling 2017 conference. Best Verifiability, Reproducibility, and Working Description awar

arXiv.org e-Print Archive

Crossref

Inducing Language Networks from Continuous Space Word Representations

Author: G.E. Hinton
K. Beyer
M. Sigman
R. Collobert
R.F.I. Cancho
Y. Bengio
Publication venue
Publication date: 01/01/2014
Field of study

Recent advancements in unsupervised feature learning have developed powerful latent representations of words. However, it is still not clear what makes one representation better than another and how we can learn the ideal representation. Understanding the structure of latent spaces attained is key to any future advancement in unsupervised learning. In this work, we introduce a new view of continuous space word representations as language networks. We explore two techniques to create language networks from learned features by inducing them for two popular word representation methods and examining the properties of their resulting networks. We find that the induced networks differ from other methods of creating language networks, and that they contain meaningful community structure.Comment: 14 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

What is Holding Back Convnets for Detection?

Author: D Hoiem
H Li
J Xu
M Everingham
P Agrawal
Y Bengio
Publication venue
Publication date: 01/01/2015
Field of study

Convolutional neural networks have recently shown excellent results in general object detection and many other tasks. Albeit very effective, they involve many user-defined design choices. In this paper we want to better understand these choices by inspecting two key aspects "what did the network learn?", and "what can the network learn?". We exploit new annotations (Pascal3D+), to enable a new empirical analysis of the R-CNN detector. Despite common belief, our results indicate that existing state-of-the-art convnet architectures are not invariant to various appearance factors. In fact, all considered networks have similar weak points which cannot be mitigated by simply increasing the training data (architectural changes are needed). We show that overall performance can improve when using image renderings for data augmentation. We report the best known results on the Pascal3D+ detection and view-point estimation tasks

arXiv.org e-Print Archive

Crossref

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

Bio-Inspired Multi-Layer Spiking Neural Network Extracts Discriminative Features from Speech Signals

Author: A Tavanaei
G Hinton
GR Doddington
JJ Wade
LR Rabiner
M Beyeler
N Kasabov
O Abdel-Hamid
S Ghosh-Dastidar
SG Wysoski
SR Kheradpisheh
T Masquelier
W Maass
Y Bengio
Y Bengio
Y Cao
Y Dan
Y LeCun
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/06/2017
Field of study

Spiking neural networks (SNNs) enable power-efficient implementations due to their sparse, spike-based coding scheme. This paper develops a bio-inspired SNN that uses unsupervised learning to extract discriminative features from speech signals, which can subsequently be used in a classifier. The architecture consists of a spiking convolutional/pooling layer followed by a fully connected spiking layer for feature discovery. The convolutional layer of leaky, integrate-and-fire (LIF) neurons represents primary acoustic features. The fully connected layer is equipped with a probabilistic spike-timing-dependent plasticity learning rule. This layer represents the discriminative features through probabilistic, LIF neurons. To assess the discriminative power of the learned features, they are used in a hidden Markov model (HMM) for spoken digit recognition. The experimental results show performance above 96% that compares favorably with popular statistical feature extraction methods. Our results provide a novel demonstration of unsupervised feature acquisition in an SNN

arXiv.org e-Print Archive

Crossref